Conversation
| if (file.getFilePath().startsWith(dataset.getPath())) { | ||
| try { | ||
| Path filePath = Paths.get(file.getFilePath()); | ||
| Files.deleteIfExists(filePath); |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 days ago
General approach: Wherever a path is derived from user-controlled fileId and prefix, we must (1) construct it using java.nio.file.Path operations, (2) normalize it, and (3) enforce that the resulting absolute path stays within the intended dataset root directory. We should also reject fileId/prefix that contain path traversal patterns (..) or path separators if they are intended to be single path components. This avoids directory traversal and arbitrary file deletion/reading.
Best concrete fix here:
-
In
DatasetFileApplicationService.getDatasetFile(Dataset dataset, String fileId, String prefix):- Treat
fileIdandprefixas relative components underdataset.getPath(). - Build the path using
Paths.get(dataset.getPath()).resolve(prefix).resolve(fileId).normalize().toAbsolutePath(). - Compute
Path datasetRoot = Paths.get(dataset.getPath()).normalize().toAbsolutePath();and check thatfilePath.startsWith(datasetRoot). If not, throwIllegalArgumentException. - Optionally ensure
fileIddoes not contain directory traversal markers by checking for..,/,\based on expected semantics. Given existing behavior allows directory-like use viaprefix, we will at least enforce the normalized containment check, which is stronger and compatible with current semantics. - Set
file.setFilePath(filePath.toString())rather than using string concatenation.
- Treat
-
In
DatasetFileApplicationService.deleteDatasetFile:- Replace the plain
if (file.getFilePath().startsWith(dataset.getPath()))andPaths.get(file.getFilePath())with:- Construct normalized absolute
Path datasetRootandPath filePath. - Verify
filePath.startsWith(datasetRoot)before callingFiles.deleteIfExists.
- Construct normalized absolute
- This guards even database-stored
filePathvalues.
- Replace the plain
-
In
DatasetFileApplicationService.downloadFile(DatasetFile file):- Similarly, retrieve the owning dataset (we do not have that here, and we are not allowed to change other files to pass it in), or at minimum normalize and ensure the path is absolute. Since we cannot change method signature or imports beyond this file, and we don't have
datasethere, we can't fully enforce root containment. However, the most critical uncontrolled path to the filesystem in this alert is deletion; CodeQL points specifically atdeleteDatasetFile. FordownloadFile, the path is taken fromDatasetFile.filePath(which we’ll now normalize/validate when computing it ingetDatasetFile) so mitigating at creation time already reduces risk. We can leavedownloadFilemostly unchanged.
- Similarly, retrieve the owning dataset (we do not have that here, and we are not allowed to change other files to pass it in), or at minimum normalize and ensure the path is absolute. Since we cannot change method signature or imports beyond this file, and we don't have
Implementation details:
- All changes are in
backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java. - We already import
java.nio.file.Pathandjava.nio.file.Paths, so no new imports are needed. - We add logic inside
getDatasetFileto build and validate aPath. - We modify
deleteDatasetFileto use normalizedPathobjects and a robuststartsWithcheck instead of string-basedstartsWith.
| @@ -205,12 +205,19 @@ | ||
| public DatasetFile getDatasetFile(Dataset dataset, String fileId, String prefix) { | ||
| prefix = StringUtils.isBlank(prefix) ? "" : prefix; | ||
| if (dataset != null && !CommonUtils.isUUID(fileId) && !fileId.startsWith(".") && !prefix.startsWith(".")) { | ||
| // 构建并校验文件路径,防止路径遍历 | ||
| Path datasetRoot = Paths.get(dataset.getPath()).normalize().toAbsolutePath(); | ||
| Path filePath = datasetRoot.resolve(prefix).resolve(fileId).normalize().toAbsolutePath(); | ||
| if (!filePath.startsWith(datasetRoot)) { | ||
| throw new IllegalArgumentException("Invalid file path"); | ||
| } | ||
|
|
||
| DatasetFile file = new DatasetFile(); | ||
| file.setId(fileId); | ||
| file.setFileName(fileId); | ||
| file.setDatasetId(dataset.getId()); | ||
| file.setFileSize(0L); | ||
| file.setFilePath(dataset.getPath() + File.separator + prefix + fileId); | ||
| file.setFilePath(filePath.toString()); | ||
| return file; | ||
| } | ||
| DatasetFile file = datasetFileRepository.getById(fileId); | ||
| @@ -237,13 +239,14 @@ | ||
| } | ||
| datasetRepository.updateById(dataset); | ||
| // 删除文件时,上传到数据集中的文件会同时删除数据库中的记录和文件系统中的文件,归集过来的文件仅删除数据库中的记录 | ||
| if (file.getFilePath().startsWith(dataset.getPath())) { | ||
| try { | ||
| Path filePath = Paths.get(file.getFilePath()); | ||
| try { | ||
| Path datasetRoot = Paths.get(dataset.getPath()).normalize().toAbsolutePath(); | ||
| Path filePath = Paths.get(file.getFilePath()).normalize().toAbsolutePath(); | ||
| if (filePath.startsWith(datasetRoot)) { | ||
| Files.deleteIfExists(filePath); | ||
| } catch (IOException ex) { | ||
| throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR); | ||
| } | ||
| } catch (IOException ex) { | ||
| throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR); | ||
| } | ||
| } | ||
|
|
| try { | ||
| Path filePath = Paths.get(file.getFilePath()).normalize(); | ||
| log.info("start download file {}", file.getFilePath()); | ||
| Resource resource = new UrlResource(filePath.toUri()); |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 days ago
In general, the problem should be fixed by validating and constraining any user-controlled components before they are used to build a filesystem path, and by ensuring that the final resolved path remains under an expected base directory. Here, that base directory is dataset.getPath(). We need to (1) restrict fileId and prefix when they are used as path components, and (2) enforce that the normalized Path used for deletion and download never escapes the dataset root.
The best minimally invasive fix is:
-
In
getDatasetFile:- Treat the non-UUID case as a “virtual” file that must be relative to the dataset root.
- Build the path using
Paths.get(dataset.getPath()).resolve(...)andnormalize(). - Explicitly reject
fileIdandprefixthat contain".."or path separators, or that start with a separator, because they are intended here as simple names or single-segment prefixes. - After resolving, verify that the resolved path still starts with the normalized dataset root path; if not, throw
IllegalArgumentException.
-
In
downloadFile:- Normalize the path, resolve it under a trusted base (here, we can use
Paths.get(dataset.getPath())conceptually, but since we only have aDatasetFile, we need to ensure thatfile.getFilePath()saved from the DB is also safe. We can at least normalize it and fail if it’s not absolute or if it contains..in an unsafe way.) - However, the synthetic non-UUID branch will already have guaranteed containment, and DB paths should already be under the dataset root; we’ll additionally normalize here to make traversal via odd sequences impossible.
- Normalize the path, resolve it under a trusted base (here, we can use
-
In
deleteDatasetFile:- Replace the raw
Paths.get(file.getFilePath())with normalization and containment validation (similar togetDatasetFile), rejecting any path that escapesdataset.getPath()before deleting.
- Replace the raw
We are only allowed to change the provided snippets, so we’ll implement validation and containment checks directly in DatasetFileApplicationService.getDatasetFile, deleteDatasetFile, and downloadFile. No new external dependencies are required; we can use existing java.nio.file.Path, Paths, and string checks.
| @@ -204,13 +204,36 @@ | ||
| @Transactional(readOnly = true) | ||
| public DatasetFile getDatasetFile(Dataset dataset, String fileId, String prefix) { | ||
| prefix = StringUtils.isBlank(prefix) ? "" : prefix; | ||
| if (dataset != null && !CommonUtils.isUUID(fileId) && !fileId.startsWith(".") && !prefix.startsWith(".")) { | ||
| if (dataset != null && !CommonUtils.isUUID(fileId)) { | ||
| // When fileId is not a UUID, treat it as a relative path component under the dataset path. | ||
| // Validate that fileId and prefix do not contain path traversal or separators. | ||
| if (fileId.startsWith(".") | ||
| || prefix.startsWith(".") | ||
| || fileId.contains("..") | ||
| || prefix.contains("..") | ||
| || fileId.contains("/") || fileId.contains("\\") | ||
| || prefix.contains("/") || prefix.contains("\\")) { | ||
| throw new IllegalArgumentException("Invalid file identifier or prefix"); | ||
| } | ||
|
|
||
| Path datasetRoot = Paths.get(dataset.getPath()).normalize(); | ||
| // Build relative path using Path APIs to avoid simple string concatenation issues | ||
| Path relativePath = StringUtils.isBlank(prefix) | ||
| ? Paths.get(fileId) | ||
| : Paths.get(prefix, fileId); | ||
| Path resolvedPath = datasetRoot.resolve(relativePath).normalize(); | ||
|
|
||
| // Ensure the resolved path stays within the dataset root directory | ||
| if (!resolvedPath.startsWith(datasetRoot)) { | ||
| throw new IllegalArgumentException("File path escapes dataset root"); | ||
| } | ||
|
|
||
| DatasetFile file = new DatasetFile(); | ||
| file.setId(fileId); | ||
| file.setFileName(fileId); | ||
| file.setDatasetId(dataset.getId()); | ||
| file.setFileSize(0L); | ||
| file.setFilePath(dataset.getPath() + File.separator + prefix + fileId); | ||
| file.setFilePath(resolvedPath.toString()); | ||
| return file; | ||
| } | ||
| DatasetFile file = datasetFileRepository.getById(fileId); | ||
| @@ -237,10 +255,14 @@ | ||
| } | ||
| datasetRepository.updateById(dataset); | ||
| // 删除文件时,上传到数据集中的文件会同时删除数据库中的记录和文件系统中的文件,归集过来的文件仅删除数据库中的记录 | ||
| if (file.getFilePath().startsWith(dataset.getPath())) { | ||
| if (file.getFilePath() != null) { | ||
| try { | ||
| Path filePath = Paths.get(file.getFilePath()); | ||
| Files.deleteIfExists(filePath); | ||
| Path datasetRoot = Paths.get(dataset.getPath()).normalize(); | ||
| Path filePath = Paths.get(file.getFilePath()).normalize(); | ||
| // Only delete files that are inside the dataset root directory | ||
| if (filePath.startsWith(datasetRoot)) { | ||
| Files.deleteIfExists(filePath); | ||
| } | ||
| } catch (IOException ex) { | ||
| throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR); | ||
| } |
| Path target = Paths.get(targetPath).normalize(); | ||
|
|
||
| // 检查源文件是否存在且为普通文件 | ||
| if (!Files.exists(source) || !Files.isRegularFile(source)) { |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 days ago
In general, to fix uncontrolled path usage, you must (1) define one or more trusted base directories, (2) ensure that any path provided by the client is treated strictly as a relative path segment or is validated/normalized, and (3) reject or constrain any path that escapes the allowed base directories (for example by containing .. or being absolute). For simple file names, you can forbid separators entirely; for hierarchical paths, you can use Path.normalize() and check that the resulting path is still beneath a trusted root.
In this code, the risk comes from file.getFilePath() being used directly as sourPath for filesystem operations. We should ensure that the resolved source file that will be linked/copied is inside a safe area, which in this system is naturally the dataset directory. We already know the dataset’s root path (dataset.getPath()), and we build the dataset’s target file path from dataset.getPath() and req.getPrefix(). The safest, minimal-change fix is:
- In
getDatasetFileForAdd, compute the dataset directory root (Path datasetRoot = Paths.get(dataset.getPath()).normalize().toAbsolutePath();) and the requested prefix path under it (datasetRoot.resolve(req.getPrefix()).normalize()), then verify that this directory stays withindatasetRoot. This guards againstprefixescaping the dataset folder. - When validating the source path in
addFile, ensure that the normalizedsourceremains insidedatasetRoot(passed from the caller), and reject it if not. Alternatively, if the intended design is thatfilePathis an absolute server-side path into a known staging area, we would instead validate it against that staging base; but we are not shown such a base, and we are already conceptually operating in terms of the dataset path, so constraining bothprefixandfilePathto the dataset area is the least disruptive, safest assumption.
Concretely, we will:
- Change the signature of
addFileto also accept the dataset root path (string) and perform a base-directory check forsource. - Update the call to
addFileinaddFilesToDataset(line 851) to passdataset.getPath()as the base. - In
addFile, afterPath source = Paths.get(sourPath).normalize();, computePath base = Paths.get(basePath).normalize().toAbsolutePath();and resolvesourceagainstbaseifsourPathis relative, then ensure that the finalsourcepath starts withbase. If the check fails, log and throw a business exception. - In
getDatasetFileForAdd, ensure thatreq.getPrefix()is applied as a path segment underdataset.getPath(), and that the resulting directory path still lies within the dataset root; if not, throw aBusinessExceptionwith an appropriate error code (for exampleDataManagementErrorCode.DIRECTORY_NOT_FOUND).
This keeps the outward behavior (adding files into datasets) the same for valid inputs, but blocks malicious or malformed paths that try to escape the dataset area.
| @@ -848,7 +848,8 @@ | ||
| setDatasetFileId(datasetFile, dataset); | ||
| dataset.addFile(datasetFile); | ||
| addedFiles.add(datasetFile); | ||
| addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd()); | ||
| // Ensure that the source file path stays within the dataset directory | ||
| addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd(), dataset.getPath()); | ||
| } | ||
| } catch (BusinessException e) { | ||
| throw e; | ||
| @@ -863,13 +864,26 @@ | ||
| return addedFiles; | ||
| } | ||
|
|
||
| private void addFile(String sourPath, String targetPath, boolean softAdd) { | ||
| if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath)) { | ||
| private void addFile(String sourPath, String targetPath, boolean softAdd, String basePath) { | ||
| if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath) || StringUtils.isBlank(basePath)) { | ||
| return; | ||
| } | ||
|
|
||
| Path base = Paths.get(basePath).normalize().toAbsolutePath(); | ||
| Path source = Paths.get(sourPath).normalize(); | ||
| if (!source.isAbsolute()) { | ||
| source = base.resolve(source).normalize(); | ||
| } else { | ||
| source = source.toAbsolutePath().normalize(); | ||
| } | ||
| Path target = Paths.get(targetPath).normalize(); | ||
|
|
||
| // 确保源文件路径在数据集目录下,防止路径穿越 | ||
| if (!source.startsWith(base)) { | ||
| log.warn("Source file path {} is outside of allowed base directory {}", source, base); | ||
| throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR); | ||
| } | ||
|
|
||
| // 检查源文件是否存在且为普通文件 | ||
| if (!Files.exists(source) || !Files.isRegularFile(source)) { | ||
| log.warn("Source file does not exist or is not a regular file: {}", sourPath); | ||
| @@ -910,13 +919,20 @@ | ||
| LocalDateTime currentTime = LocalDateTime.now(); | ||
| String fileName = sourcePath.getFileName().toString(); | ||
|
|
||
| Path datasetRoot = Paths.get(dataset.getPath()).normalize().toAbsolutePath(); | ||
| Path targetDir = datasetRoot.resolve(req.getPrefix()).normalize(); | ||
| // 防止 prefix 将文件放置到数据集目录之外 | ||
| if (!targetDir.startsWith(datasetRoot)) { | ||
| throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND); | ||
| } | ||
|
|
||
| return DatasetFile.builder() | ||
| .id(UUID.randomUUID().toString()) | ||
| .datasetId(dataset.getId()) | ||
| .fileName(fileName) | ||
| .fileType(AnalyzerUtils.getExtension(fileName)) | ||
| .fileSize(sourceFile.length()) | ||
| .filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString()) | ||
| .filePath(targetDir.resolve(fileName).toString()) | ||
| .uploadTime(currentTime) | ||
| .lastAccessTime(currentTime) | ||
| .metadata(objectMapper.writeValueAsString(file.getMetadata())) |
| Path parent = target.getParent(); | ||
| // 创建目标目录(如果需要) | ||
| if (parent != null) { | ||
| Files.createDirectories(parent); |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 days ago
General approach: ensure any path derived from user input is constrained to a safe base directory. For this code, that means:
- Construct the target path for dataset files strictly under
dataset.getPath()using only sanitized components (dataset path + safe prefix + file name). - Validate that the resolved target path remains under the dataset base path after
normalize(). - Prevent arbitrary source paths if they are user-controlled in a risky way (here, they appear to be absolute paths to existing files, which may be acceptable in a trusted environment; we will at least constrain the target).
Best concrete fix with minimal behavior change:
-
Introduce a small helper
resolveSafeDatasetPath(Dataset dataset, String prefix, String fileName)inDatasetFileApplicationServicethat:- Builds
baseasPaths.get(dataset.getPath()).toAbsolutePath().normalize(). - Builds
relativefromprefixandfileName(treating empty prefix correctly). - Combines them into
resolved = base.resolve(relative).normalize(). - Verifies
resolved.startsWith(base); if not, logs and throwsBusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND)(or another appropriate error). - Returns
resolved.
- Builds
-
Use this helper in
getDatasetFileForAddinstead ofPaths.get(dataset.getPath(), req.getPrefix(), fileName).toString(). This ensures the path stored inDatasetFile.filePathis guaranteed to reside under the dataset directory. -
Additionally, harden
addFileso that when it is used for dataset operations, the target path has already been checked. SinceaddFilereceives only strings and we should not assume more context, we keep its current behavior but rely on the fact that, for this flow,targetPathis now sanitized via step 1. We keep the normalization and directory creation as-is.
All changes are localized to DatasetFileApplicationService.java; no modifications are needed in DatasetFileController.java or DatasetFile.java, and no new external dependencies are required.
| @@ -824,6 +824,30 @@ | ||
| } | ||
|
|
||
| /** | ||
| * 基于数据集根路径、安全前缀和文件名解析出安全的文件路径。 | ||
| * 确保最终路径仍位于数据集目录之下,防止目录遍历。 | ||
| */ | ||
| private Path resolveSafeDatasetPath(Dataset dataset, String prefix, String fileName) { | ||
| Path basePath = Paths.get(dataset.getPath()).toAbsolutePath().normalize(); | ||
|
|
||
| Path relativePath; | ||
| if (StringUtils.isBlank(prefix)) { | ||
| relativePath = Paths.get(fileName); | ||
| } else { | ||
| relativePath = Paths.get(prefix, fileName); | ||
| } | ||
|
|
||
| Path resolved = basePath.resolve(relativePath).normalize(); | ||
|
|
||
| if (!resolved.startsWith(basePath)) { | ||
| log.warn("Resolved dataset file path {} escapes base path {}", resolved, basePath); | ||
| throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND); | ||
| } | ||
|
|
||
| return resolved; | ||
| } | ||
|
|
||
| /** | ||
| * 添加文件到数据集(仅创建数据库记录,不执行文件系统操作) | ||
| * | ||
| * @param datasetId 数据集id | ||
| @@ -902,21 +926,23 @@ | ||
| } | ||
| } | ||
|
|
||
| private static DatasetFile getDatasetFileForAdd(AddFilesRequest req, AddFilesRequest.FileRequest file, | ||
| Dataset dataset, ObjectMapper objectMapper) throws JsonProcessingException { | ||
| private DatasetFile getDatasetFileForAdd(AddFilesRequest req, AddFilesRequest.FileRequest file, | ||
| Dataset dataset, ObjectMapper objectMapper) throws JsonProcessingException { | ||
| Path sourcePath = Paths.get(file.getFilePath()); | ||
| File sourceFile = sourcePath.toFile(); | ||
| file.getMetadata().put("softAdd", req.isSoftAdd()); | ||
| LocalDateTime currentTime = LocalDateTime.now(); | ||
| String fileName = sourcePath.getFileName().toString(); | ||
|
|
||
| Path safeTargetPath = resolveSafeDatasetPath(dataset, req.getPrefix(), fileName); | ||
|
|
||
| return DatasetFile.builder() | ||
| .id(UUID.randomUUID().toString()) | ||
| .datasetId(dataset.getId()) | ||
| .fileName(fileName) | ||
| .fileType(AnalyzerUtils.getExtension(fileName)) | ||
| .fileSize(sourceFile.length()) | ||
| .filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString()) | ||
| .filePath(safeTargetPath.toString()) | ||
| .uploadTime(currentTime) | ||
| .lastAccessTime(currentTime) | ||
| .metadata(objectMapper.writeValueAsString(file.getMetadata())) |
| if (parent != null) { | ||
| Files.createDirectories(parent); | ||
| } | ||
| Files.deleteIfExists(target); |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 days ago
In general, to fix uncontrolled path usage, we must (1) constrain all user-controlled path components to a designated base directory and (2) validate or normalize those components to prevent path traversal (such as .. segments or absolute paths). The checks should be done before any filesystem operations, and they should enforce that the final resolved path is still under the intended root.
In this code, the unsafe pieces are:
file.getFilePath()(used as the source path).req.getPrefix()(used to construct the targetfilePathand thustargetPathpassed toaddFile).
The safest minimal change, without altering existing business behavior, is:
- Restrict the target path (the dataset file location) to be under
dataset.getPath()by:- Building the path with
Paths.get(dataset.getPath()).resolve(req.getPrefix()).resolve(fileName).normalize(). - Ensuring that the normalized result starts with the normalized dataset root.
- If it does not, reject the request with an appropriate
BusinessException.
- Building the path with
- In
addFile, verify thattargetis within the dataset root directory (and optionally that it is not a symlink) before creating directories, deleting, linking, or copying. SinceaddFilecurrently receives just strings, we can pass the dataset path (root) into it as an additional argument or reconstruct the dataset root fromtargetPath. To stay within minimal functional changes, we will pass adatasetRootstring from the caller (where we havedataset.getPath()). - For the source path (
sourPath), we should at least prevent obviously dangerous values such as blank, relative with.., or containing illegal characters, because the code expects to link/copy from local files. If in practice only server-generated paths are used, this mainly hardens against misuse; but we will add a simple validation: the source path must be absolute and normalized, and we will not allow it to be under system-critical directories (we will keep this lightweight to avoid breaking expected behavior).
Concretely:
- Modify
getDatasetFileForAddinDatasetFileApplicationServiceto:- Compute
Path datasetRoot = Paths.get(dataset.getPath()).normalize(); - Compute
Path prefixPath = StringUtils.isBlank(req.getPrefix()) ? datasetRoot : datasetRoot.resolve(req.getPrefix()).normalize(); - Ensure
prefixPath.startsWith(datasetRoot); otherwise throwBusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND)(or a more appropriate code if available). - Compute
Path targetPath = prefixPath.resolve(fileName).normalize(); - Ensure
targetPath.startsWith(datasetRoot); otherwise throw the same exception. - Use
targetPath.toString()forfilePathin the builder.
- Compute
- Modify
addFilesToDatasetto pass the dataset root path intoaddFile:addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd(), dataset.getPath()); - Update the
addFilemethod signature to include the dataset root and:- Normalize
datasetRoottoPath datasetRootPath = Paths.get(datasetRoot).normalize().toAbsolutePath(); - Normalize
sourceandtargetto absolute paths. - Ensure
target.startsWith(datasetRootPath); if not, log and throwBusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR)before any filesystem write. - Optionally perform a simple validation on
sourPath: reject if it contains..segments when normalized, or if it is not absolute.
- Normalize
These changes ensure that, even if a malicious caller provides a crafted prefix or file path, the service will not write outside the dataset’s directory tree.
| @@ -848,7 +848,7 @@ | ||
| setDatasetFileId(datasetFile, dataset); | ||
| dataset.addFile(datasetFile); | ||
| addedFiles.add(datasetFile); | ||
| addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd()); | ||
| addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd(), dataset.getPath()); | ||
| } | ||
| } catch (BusinessException e) { | ||
| throw e; | ||
| @@ -863,13 +863,20 @@ | ||
| return addedFiles; | ||
| } | ||
|
|
||
| private void addFile(String sourPath, String targetPath, boolean softAdd) { | ||
| if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath)) { | ||
| private void addFile(String sourPath, String targetPath, boolean softAdd, String datasetRoot) { | ||
| if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath) || StringUtils.isBlank(datasetRoot)) { | ||
| return; | ||
| } | ||
| Path source = Paths.get(sourPath).normalize(); | ||
| Path target = Paths.get(targetPath).normalize(); | ||
| Path datasetRootPath = Paths.get(datasetRoot).normalize().toAbsolutePath(); | ||
| Path source = Paths.get(sourPath).normalize().toAbsolutePath(); | ||
| Path target = Paths.get(targetPath).normalize().toAbsolutePath(); | ||
|
|
||
| // 确保目标路径在数据集根目录下,防止目录遍历或任意路径写入 | ||
| if (!target.startsWith(datasetRootPath)) { | ||
| log.warn("Target path is outside of dataset root. datasetRoot={}, target={}", datasetRootPath, target); | ||
| throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR); | ||
| } | ||
|
|
||
| // 检查源文件是否存在且为普通文件 | ||
| if (!Files.exists(source) || !Files.isRegularFile(source)) { | ||
| log.warn("Source file does not exist or is not a regular file: {}", sourPath); | ||
| @@ -910,13 +914,31 @@ | ||
| LocalDateTime currentTime = LocalDateTime.now(); | ||
| String fileName = sourcePath.getFileName().toString(); | ||
|
|
||
| // 构建并校验目标路径,确保始终位于数据集根目录下 | ||
| Path datasetRootPath = Paths.get(dataset.getPath()).normalize().toAbsolutePath(); | ||
| Path prefixPath; | ||
| if (StringUtils.isBlank(req.getPrefix())) { | ||
| prefixPath = datasetRootPath; | ||
| } else { | ||
| prefixPath = datasetRootPath.resolve(req.getPrefix()).normalize(); | ||
| } | ||
| if (!prefixPath.startsWith(datasetRootPath)) { | ||
| log.warn("Invalid prefix for dataset. datasetRoot={}, prefix={}", datasetRootPath, req.getPrefix()); | ||
| throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND); | ||
| } | ||
| Path targetPath = prefixPath.resolve(fileName).normalize(); | ||
| if (!targetPath.startsWith(datasetRootPath)) { | ||
| log.warn("Computed target path is outside of dataset root. datasetRoot={}, target={}", datasetRootPath, targetPath); | ||
| throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND); | ||
| } | ||
|
|
||
| return DatasetFile.builder() | ||
| .id(UUID.randomUUID().toString()) | ||
| .datasetId(dataset.getId()) | ||
| .fileName(fileName) | ||
| .fileType(AnalyzerUtils.getExtension(fileName)) | ||
| .fileSize(sourceFile.length()) | ||
| .filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString()) | ||
| .filePath(targetPath.toString()) | ||
| .uploadTime(currentTime) | ||
| .lastAccessTime(currentTime) | ||
| .metadata(objectMapper.writeValueAsString(file.getMetadata())) |
| if (softAdd) { | ||
| // 优先尝试创建硬链接,失败后尝试创建符号链接;若均失败抛出异常 | ||
| try { | ||
| Files.createLink(target, source); |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 days ago
General approach: ensure that any filesystem path derived from user-controlled data is constrained to a safe base directory. Here, the safe base is dataset.getPath(). We should (1) construct the target file path based on dataset.getPath() and user input, (2) normalize and convert both base and target to absolute paths, and (3) verify that the target is still under the base (startsWith). If not, reject the request.
Best concrete fix while preserving behavior:
-
In
getDatasetFileForAdd, when building theDatasetFile.filePathfromdataset.getPath(),req.getPrefix(), andfileName, we already usePaths.get(...). We should:- Normalize and make the resulting path absolute.
- Ensure it remains under the dataset root directory (
dataset.getPath()). - If it escapes, throw a
BusinessExceptionusing an appropriate error code (e.g.,SystemErrorCode.FILE_SYSTEM_ERRORor an existing data-management error code). - Store the validated, normalized path string in
filePath.
-
In
addFile, before usingtargetin filesystem operations, we should:- Require the caller to pass the dataset base path (or, equivalently, compute it here if available), but we cannot change method signature without touching callers across the codebase. However, in this code path the only usage we see is
addFile(file.getFilePath(), datasetFile.getFilePath(), ...), wheredatasetFile.getFilePath()will already have been validated by the improvedgetDatasetFileForAdd. Thus, tighteninggetDatasetFileForAddis sufficient to ensuretargetis safe. Still, we can hardenaddFilefurther by re-normalizingtargetand checking that it’s absolute (to avoid relative surprises); this does not require knowing the dataset root, but containment is already enforced earlier.
- Require the caller to pass the dataset base path (or, equivalently, compute it here if available), but we cannot change method signature without touching callers across the codebase. However, in this code path the only usage we see is
Given the constraints to only edit the shown snippets and not alter external contracts, the minimally invasive, effective fix is to:
- Update
getDatasetFileForAddinDatasetFileApplicationServiceto:- Build a normalized
Path datasetRoot = Paths.get(dataset.getPath()).normalize().toAbsolutePath(); - Build
Path targetPath = datasetRoot.resolve(req.getPrefix()).resolve(fileName).normalize().toAbsolutePath(); - Verify that
targetPath.startsWith(datasetRoot); otherwise throw aBusinessException(e.g.,SystemErrorCode.FILE_SYSTEM_ERROR). - Set
filePathtotargetPath.toString()instead of the previousPaths.get(...).toString().
- Build a normalized
addFile can remain logically the same, because it will now receive a targetPath that has already been validated to stay inside datasetRoot. This directly addresses the CodeQL alert on line 889: although target is still tainted, it is constrained to a safe directory, which removes the security risk and should satisfy the analyzer once it recognizes the containment check.
No changes are needed in DatasetFileController or DatasetFile model for this specific path traversal issue.
| @@ -910,13 +910,28 @@ | ||
| LocalDateTime currentTime = LocalDateTime.now(); | ||
| String fileName = sourcePath.getFileName().toString(); | ||
|
|
||
| // Build and validate the target path to ensure it stays under the dataset root | ||
| Path datasetRoot = Paths.get(dataset.getPath()).normalize().toAbsolutePath(); | ||
| Path targetPath = datasetRoot | ||
| .resolve(StringUtils.defaultString(req.getPrefix())) | ||
| .resolve(fileName) | ||
| .normalize() | ||
| .toAbsolutePath(); | ||
|
|
||
| // Ensure the normalized target path is still within the dataset root directory | ||
| if (!targetPath.startsWith(datasetRoot)) { | ||
| log.warn("Invalid target path when adding file to dataset. root={}, target={}, prefix={}, fileName={}", | ||
| datasetRoot, targetPath, req.getPrefix(), fileName); | ||
| throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR); | ||
| } | ||
|
|
||
| return DatasetFile.builder() | ||
| .id(UUID.randomUUID().toString()) | ||
| .datasetId(dataset.getId()) | ||
| .fileName(fileName) | ||
| .fileType(AnalyzerUtils.getExtension(fileName)) | ||
| .fileSize(sourceFile.length()) | ||
| .filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString()) | ||
| .filePath(targetPath.toString()) | ||
| .uploadTime(currentTime) | ||
| .lastAccessTime(currentTime) | ||
| .metadata(objectMapper.writeValueAsString(file.getMetadata())) |
| } catch (Throwable hardEx) { | ||
| log.warn("create hard link failed from {} to {}: {}", source, target, hardEx.getMessage()); | ||
| } | ||
| Files.createSymbolicLink(target, source); |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 days ago
In general terms, the fix is to ensure that any filesystem path derived from user input is validated before being used as a target for file operations. The usual pattern is: build the path relative to a known safe base directory, normalize it, then check that the resulting absolute path is still within the base directory. If the check fails, reject the request.
In this case, the key unsafe operation is in addFile(String sourPath, String targetPath, boolean softAdd) in DatasetFileApplicationService. Here targetPath ultimately comes from Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString() which includes the user-controlled req.getPrefix(). The best fix with minimal functional impact is:
- Add a reference to the dataset base directory into
addFile, by passingdataset.getPath()as an extra argument when calling it fromaddFilesToDataset. - Inside
addFile, resolve and normalize thetargetpath against the trusted base directory, then ensure the normalized path starts with that base directory. If not, throw aBusinessException(e.g., using an existing error code such asSystemErrorCode.FILE_SYSTEM_ERRORor a more specific one if desired). - Keep source path behavior as-is, because the described vulnerability is specifically about the target path that the service writes to within the dataset.
Concretely:
- Change the signature of
addFilefromprivate void addFile(String sourPath, String targetPath, boolean softAdd)toprivate void addFile(String sourPath, String targetPath, boolean softAdd, String datasetBasePath). - Adjust the call in
addFilesToDatasetto passdataset.getPath()as the new argument. - Inside
addFile:- Compute
Path baseDir = Paths.get(datasetBasePath).toAbsolutePath().normalize(); - Compute
Path target = baseDir.resolve(targetPath).normalize(); - Check
if (!target.startsWith(baseDir)) { log.warn(...); throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR); } - Proceed with directory creation and link/copy using this validated
target.
- Compute
- Keep other behavior (softAdd, hard link fallback, etc.) unchanged.
This stays within the existing files (DatasetFileApplicationService) and uses only standard JDK APIs (java.nio.file.Paths, Path.startsWith), so no new external dependencies are required.
| @@ -848,7 +848,8 @@ | ||
| setDatasetFileId(datasetFile, dataset); | ||
| dataset.addFile(datasetFile); | ||
| addedFiles.add(datasetFile); | ||
| addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd()); | ||
| // 使用数据集根路径作为安全基准目录,防止目标路径越权 | ||
| addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd(), dataset.getPath()); | ||
| } | ||
| } catch (BusinessException e) { | ||
| throw e; | ||
| @@ -863,13 +864,20 @@ | ||
| return addedFiles; | ||
| } | ||
|
|
||
| private void addFile(String sourPath, String targetPath, boolean softAdd) { | ||
| private void addFile(String sourPath, String targetPath, boolean softAdd, String datasetBasePath) { | ||
| if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath)) { | ||
| return; | ||
| } | ||
| Path source = Paths.get(sourPath).normalize(); | ||
| Path target = Paths.get(targetPath).normalize(); | ||
| Path baseDir = Paths.get(datasetBasePath).toAbsolutePath().normalize(); | ||
| Path target = baseDir.resolve(targetPath).normalize(); | ||
|
|
||
| // 确保目标路径仍在数据集根目录之下,防止目录遍历 | ||
| if (!target.startsWith(baseDir)) { | ||
| log.warn("Target path is outside of dataset base directory. baseDir={}, target={}", baseDir, target); | ||
| throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR); | ||
| } | ||
|
|
||
| // 检查源文件是否存在且为普通文件 | ||
| if (!Files.exists(source) || !Files.isRegularFile(source)) { | ||
| log.warn("Source file does not exist or is not a regular file: {}", sourPath); |
| Files.createSymbolicLink(target, source); | ||
| } else { | ||
| // 覆盖已存在的目标文件,保持与其他地方行为一致 | ||
| Files.copy(source, target); |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 days ago
In general terms, the fix is to validate and constrain any user‑provided path before using it in file system operations. For addFile(String sourPath, String targetPath, boolean softAdd), we should ensure that sourPath is either: (a) under a known, trusted base directory (for example, a configured upload or staging directory), and that its normalized form does not escape that directory; or (b) at least not an absolute path and not containing .. path traversal components if the design expects only relative paths. The same normalized‑under‑base check is already used elsewhere in the project (e.g., in directory download/delete flows) and is a standard pattern.
The best fix here without changing higher‑level behavior is:
- Introduce a base directory field in
DatasetFileApplicationService(e.g., injected from configuration via@Value("${datamanagement.add-files.source-base-path:}")or similar). Since we cannot safely assume config keys, and we must not change behavior too much, a safer minimal change is to enforce that the source path is not absolute and does not contain..components. This keeps the semantics “copy/link from a path the caller provides”, but blocks obvious path traversal and system‑file access by absolute paths. - In
addFile, after buildingPath source = Paths.get(sourPath).normalize();, add validation:- Reject if
source.isAbsolute(). - Reject if the normalized path contains any
..segment (can be detected by scanningsource’s name elements).
- Reject if
- Optionally log and throw
BusinessExceptionwith an appropriate error code when validation fails.
These changes are all local to DatasetFileApplicationService.addFile. No other files need modification. No new external libraries are required; we rely solely on java.nio.file.Path methods and existing exception types.
Concretely, in DatasetFileApplicationService.java:
- In the
addFilemethod (lines 866–903), after computingsource, insert validation logic to ensure it is a relative path without... If validation fails, log and throwBusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR)(or a more specific error code if desired). - Keep the rest of the method as‑is so functionality (copy/link behavior, error handling) remains unchanged except for rejecting unsafe source paths.
| @@ -870,6 +870,18 @@ | ||
| Path source = Paths.get(sourPath).normalize(); | ||
| Path target = Paths.get(targetPath).normalize(); | ||
|
|
||
| // 校验源路径,防止路径遍历和访问任意绝对路径 | ||
| if (source.isAbsolute()) { | ||
| log.warn("Rejected absolute source path when adding file: {}", source); | ||
| throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR); | ||
| } | ||
| for (Path part : source) { | ||
| if ("..".equals(part.toString())) { | ||
| log.warn("Rejected source path with parent directory reference when adding file: {}", source); | ||
| throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR); | ||
| } | ||
| } | ||
|
|
||
| // 检查源文件是否存在且为普通文件 | ||
| if (!Files.exists(source) || !Files.isRegularFile(source)) { | ||
| log.warn("Source file does not exist or is not a regular file: {}", sourPath); |
| Files.createSymbolicLink(target, source); | ||
| } else { | ||
| // 覆盖已存在的目标文件,保持与其他地方行为一致 | ||
| Files.copy(source, target); |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 days ago
In general, to fix uncontrolled path usage you must (1) validate any user-controlled path components and/or (2) constrain resulting paths to a known safe base directory after normalization. For this code, the vulnerable sink is the addFile method: it takes two string paths where both arguments can be influenced by the request, resolves them via Paths.get, then performs file operations.
Best fix while preserving current behavior:
- Treat the dataset’s directory (
dataset.getPath()) as the only allowed root for target paths. - Normalize and resolve the constructed
targetpath against that root, and ensure it does not escape the dataset root (startsWithcheck). - Optionally, perform minimal sanity checks on the client-supplied
file.getFilePath()to avoid obviously dangerous sources, even if they might be semi-trusted already.
Because addFile currently only receives String sourPath and String targetPath, we need to pass the dataset root directory into it so it can enforce the containment check. We can do this with minimal change by:
- Introducing a new private helper
addFileWithinDataset(Path datasetRoot, String sourPath, String targetPath, boolean softAdd)that:- Validates
sourPathandtargetPathare non-blank (as today). - Computes
Path source = Paths.get(sourPath).normalize().toAbsolutePath(); - Computes
Path datasetRootAbs = datasetRoot.normalize().toAbsolutePath(); - Computes
Path target = datasetRootAbs.resolve(targetPath).normalize(); - Ensures
target.startsWith(datasetRootAbs); if not, log and throwBusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR)(or a more specific error). - Proceeds with the existing existence and copy/link logic, but uses the validated
targetandsource.
- Validates
- Updating
addFilesToDatasetso that instead of callingaddFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd());it callsaddFileWithinDataset(Paths.get(dataset.getPath()), file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd());. - Optionally, we can leave the old
addFilemethod in place but delegate to the new, safer version using anullbase only internally if we know all callers can supply a dataset root. To avoid changing other parts of the code we haven’t seen, we’ll keepaddFileand make it a thin wrapper requiring a base path, but since the only shown call is fromaddFilesToDataset, we can simply stop usingaddFilefrom there and keep its behavior unchanged for any unknown callers.
This approach keeps all external functionality (copy vs softAdd, overwrite behavior, etc.) the same, but ensures that the target file is always inside the dataset’s directory, preventing path traversal via req.getPrefix() or any future uses of DatasetFile.filePath.
No description provided.